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Abstract 

We investigate the trading behavior of a large set of single investors trad- 
ing the highly liquid Nokia stock over the period 2003-2008 with the aim of 
determining the relative role of endogenous and exogenous factors that may 
affect their behavior. As endogenous factors we consider returns and volatil- 
ity whereas the exogenous factors we use are the total daily number of news 
and a semantic variable based on a sentiment analysis of news. Linear regres- 
sion and partial correlation analysis of data show that different categories of 
investors are differently correlated to these factors. Governmental and non 
profit organizations are weakly sensitive to news and returns or volatility, 
and, typically, they are more correlated with the former than with the latter. 
Households and companies, on the contrary, are very sensitive to both en- 
dogenous and exogenous factors, and volatility and returns are, on average, 
much more relevant than the number of news and sentiment, respectively. 
Finally, financial institutions and foreign organizations are intermediate be- 
tween these two cases, in terms of both the total explanatory power of these 
factors and their relative importance. 
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1. Introduction 



The efficient market hypothesis assumes that financial markets discount 
immediately all information available about the listed assets. In general the 
flux of information is both endogenous and exogenous and this leads to a 
classification of different forms of efficiency. It is natural therefore that a 
large empirical literature exists with the aim of measuring the role of exoge- 
nous and endogenous sources of information in explaining price dynamics, 
both in absolute and in relative terms. The seminal work of ICutler et al.l 
(1989) started a stream of research trying to connect exogenous news with 
price movements. As detailed in the next literature review section, more re- 
cent papers have investigated stock price reaction to news, e.g. Chan (2003); 



Vega (2006), the correlation between high/low pessimism of media and high 



market trading volume (Tetlock, 2007), the relation between the sentiment 
of news, earnings and return predictability (Tetlock et al. , 2008), the cor- 



relation between the volume of searching of news in the Google searching 
engine and many financial indicators of stocks (Da et al. 2011), the role of 



news in the trading action of short sellers (Engelberg et al. 2008), the role 



of macroeconomic news in the performance of stock returns (Birz & Lott 



2011), and the high frequency market reaction to news, (Joulin et al. 2008 



Gross-Klussmann & Hautsch, 2011 ). All these papers are concerned with the 



relation between news and price movements. A different and less explored 
stream of research, to which the present paper belongs, investigates the role 
of news on the trading and investment decisions of single investors. The 
main difficulty of this type of research is the availability of micro data about 
the activity of single investors. Recently, some papers have investigated how 



news affect the selection of stocks performed by single investors (Barber & 



Odean, 2008 ), the role of individual investor decisions in causing post-earning 
announcement drift (Hirshleifer et al. 2008), and the relation between high 



news attention and the level of trading of single investors (Yuan, 2012). 

Financial markets are extremely heterogeneous systems, and investors 
represent an important source of heterogeneity. Investors are different in 
many respects, including their risk profile, the size of their investment, the 
regulatory constraints to which they are subject, the information they have 
access to, etc. One clear form of heterogeneity among investors, which in- 
corporates most of the aforementioned differences, is the category to which 
an investor belongs. A large financial institution is clearly different from an 
household, or from a governmental institution and this difference might be 
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reflected in the way each investor reacts to exogenous (news) or endogenous 
(price returns or volatility) factors. The concept of category is related to a 
specific classification and it is not unique. In this paper, we will use a classifi- 
cation given by the data we are using, which allows us to discriminate between 
non-financial corporations, financial and insurance corporations, general gov- 
ernmental organizations, non-profit institutions, households, and foreign or- 
ganizations. 

More specifically, in the present study, we investigate how different cate- 
gories of single investors react to exogenous and endogenous factors by look- 
ing at their trading activity, both in terms of being active in the market and 
in terms of the decision to buy or to sell an asset. To this aim we make 
use of two very detailed datasets. We consider the Nokia stock and we have 
access to a database containing the trading activity of all the investors whose 
financial ownership is recorded by the Finnish Central Securities Depository. 
These data allow us to classify single investors in terms of the category men- 
tioned above, and to characterize the buying and selling activity on Nokia 
with a daily resolution. For the exogenous news we consider all the Thom- 
son Reuters news released during the time period 2003-2008 and containing 
information about the company Nokia. 

The number of daily news about Nokia gives us a signal about the in- 
tensity of exogenous information without interpreting the content of the 
news. In order to have a semantic interpretation of the each news and 
to classify it in terms of good or bad news we use the General Inquirer 
(http: / /www.wjh.harv ard.edu/^inquirer/| , a well-known content analysis pro- 
gram which is using the General Inquire categories from the Harvard psy- 
chosocial dictionary. We construct a simple proxy of the sentiment of news 
arriving into the market by applying the General Inquirer and measuring the 
absolute or relative difference between positive and negative words in the 
headline. 

The main analysis of the paper is a linear regression and partial corre- 
lation analysis which allows us to assess for each category of investors the 
absolute and relative role of endogenous (price return and volatility) and 
exogenous (number of news and sentiment indicator) factors in explaining 
the decision to trade and, when they trade, the decision to buy or to sell. 
As detailed below we find a different behavior among different categories of 
investors. Compared to other categories, governmental and non profit institu- 
tions are in absolute terms less affected by endogenous and exogenous factors. 
In relative terms, they are more affected by news than by price dynamics. 
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On the contrary, trading action of households and non financial companies is 
significantly explained by the regression, but returns and volatility are more 
important than exogenous news. Finally, we show that financial institutions 
and foreign institutions display an intermediate behavior, but endogenous 
factors are more important than exogenous ones. 

The paper is organized as follows: In Section II we review the empirical 
literature on news, price dynamics, and trading activity. In Section III we 
describe the databases used in our study. In Section IV we introduce the 
variables used to characterize the trading activity of the investors and the 
proxies used for the flux of news, volatility and sentiment indicators. Section 
V presents and discuss the results obtained from the regression analysis, and 
Section VI concludes. 



2. Literature review 



There is a vast literature about the role of news in financial markets. We 
can divide the literature in two streams of research. The first and older one 
considered the problem of how news affect asset price. The second stream of 
research, more closely related to the present paper, considered the problem 
on how single investors react to news announcements. This type of analysis 
has been possible only starting recently because of the availability of large 
datasets with records of the trading history of single investors. In this section 
we review these two streams of literature. 

Many papers have investigated price reaction to news since the pioneering 



work of Cutler et al. ( 1989 ) that estimated for the first time the fraction of 



the variation that can be attributed to economic news in aggregated stock 
returns. A few years later, Ederington & Lee (1993) studied the impact of 



scheduled macroeconomic news announcements on interest rate and foreign 



exchange futures markets. Engle & Ng (1993) investigated how new infor- 



mation is incorporated into volatility estimates in the presence of asymmetry 



in the impact of news. Mitchell & Mulherin (1994) studied the relation be 



tween the daily number of Dow Jones news and aggregate measures of market 
activity such as trading volume and market returns. 

Starting from 2003, new studies using comprehensive databases of news 
appeared in the literature. Chan (2003) showed that stocks experiencing 



negative returns concurrent with the arrival of a news story continued to 
underperform their peers. The same public news database was successfully 
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used by Vega (2006), together with the estimation of the probability of pri- 
vate information-based trading, to empirically measure the effect of private 
and public information on post-announcement drift. The sentiment carried 
by news impacting the market was first investigated by using the daily con- 
tent from a popular Wall Street Journal column (Tetlock, 2007). In his 



study, Paul Tetlock found that news with negative sentiment predicts down- 
ward pressure on market price followed by a reversion to fundamentals. In a 
successive study, Tetlock et al. (2008) examined whether quantitative mea- 
sures of sentiment of the text of news can be used to predict individual 
firms' accounting earnings and stock returns. In their study they concluded 
that linguistic content of news captures otherwise hard to quantify aspects 
of firms' fundamentals that are quickly reflected into stock prices. Another 
study investigated the role of dissemination of information on security pric- 
ing (Fang & Peress, 2009), showing that stocks with no media coverage earn 
higher returns than stocks with high media coverage. The role of investors' 
attention was also considered from the different perspective of information 



demand in a study of the Google Search Volume Index (Da et al. 2011). 



In this study, authors related the Google search Volume Index to a sample 
of Russel 3000 stocks showing that an increase in the Search Volume Index 
predicts higher stock prices in the next two weeks and an eventual price 
reversal. 

The role of news in short sales was investigated by Engelberg et al. (2008 ), 
who found that a negative relation between short sales and future returns is 
order twice larger on news days than on days without a significant flux of 
news. The relationship is of the order of four times on days with negative 
news. The analysis of a large electronic database of news allowed to inves- 



tigate the role of news in high frequency trading on both volatility (Joulin 



et al., 2008) and price formation and book dynamics ( Gross- Klussmann & 



Hautsch, 2011). Joulin et al. (2008) found that volatility patterns around 



market endogenous jumps and around exogenous news are quite different 
with endogenous jumps followed by increased volatility and news triggering 
periods of lower than average volatility. The role of high frequency senti- 
ment indicators on future price trends and bid-ask spreads has been studied 
by considering the high frequency price evolution of twenty stocks traded at 
the London Stock Exchange ( Gross-Klussmann & Hautsch, 2011) and the 
Reuters NewScope Sentiment Engine, which is a pre-processed set of news 
data and electronic tools analyzing textual information using linguistic pat- 
tern recognition algorithms. Recent studies have also considered the role of 
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specific categories of news, such as macroeconomic news (Birz & Lott , 2011 ), 
and the role of investor sentiment on the market's mean-variance trade off 
QYu fc YuanHMIj ). 

The literature on the role of news on trading decision and activity of sin- 
gle investors is more limited. Barber & Odean (2008) tested and confirmed 
the hypothesis that individual investors are net buyers of stocks frequently 
discussed in the news. They proposed a model of decision making in which in- 
dividual investors consider primarily those stocks having attention-grabbing 
qualities and preferential selection among them is exercised only after at- 
tention has limited the choice set. The response of individual investors to 
news was investigated by Yuan in a study that considered the trading and 
position information of 78,000 households investing in US markets from Jan- 
uary 1991 to December 1996 (Yuan, 2012). In his study, Yuan showed that 
the impact of attention is pervasive across market. High attention causes 
individual investors to reduce stock positions in good times and moderately 
increase stock positions in bad times. His results also indicate that attention 
is one source of the cost of monitoring portfolios and that investors sensitive 
to news are more subject to the disposition effect. Another investigation on 
the behavior of individual investors in the presence of public news studied 
the role of individual investors in causing post-earnings announcement drift 
(Hirshleifer et al. , 2008). Authors found that individuals are net buyers after 
both negative and positive bold earning announcements. 



3. Data 

In this paper we investigate the database maintained by the Euroclear 
Finland (previously Nordic Central Securities Depository Finland). The 
database is the central register of shareholdings for Finnish stocks and finan- 
cial assets in the Finnish Central Securities Depository. Practically all major 
publicly traded Finnish companies have joined the register. The register re- 
ports the shareholdings of all Finnish investors and of all foreign investors 
asking to exercise their vote right. Both retail and institutional investors are 
included. The database records official ownership of companies and financial 
assets and the trading records are updated on a daily basis according to the 
Finnish Book Entry System. The records include all the transactions, exe- 
cuted in worldwide stock exchanges and in other venues, which change the 
ownership of the assets. 
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The database classifies investors into six main categories: non-financial 
corporations, financial and insurance corporations, general governmental or- 
ganizations, non-profit institutions, households, and foreign organizations. 
The database is collected since January 1, 1995. In the present study we 
investigate the market activity of investors trading the Nokia stock, which 
was, across the years under investigation, either the most capitalized stock 
or one of the most capitalized stocks in the Finnish stock market. 

While the database contains very detailed information about the Finnish 
domestic investors, foreign investors can choose to use nominee registration. 
In this case, the investor's book entry account provider aggregates all the 
transactions from all of its accounts, and a single nominee register coded 
identity contains the holdings of many foreign investors. This means that 
our results describe in a detailed way the actions of all the Finnish domestic 
investors and those foreign investors who do not use nominee registration, 
while a very small fraction of the coded identities correspond to a large 
aggregated ownership. 

For this reason, in the present study, we consider only the set of single 
investors trading the Nokia stock during the period of time from January 2, 
2003 to December 30, 2008 (a set of 1, 510 daily records) and we investigate 
all the market transactions performed by them. Single investor means here a 
retail or an institutional investor that do not use nominee registration (essen- 
tially all the Finnish investors). The total number of investors is 141, 190 and 
the total number of transactions is 7, 494, 104. Table [T] reports the number 
of investors, the number of transactions, and the traded volume for the six 
categories. 

In this paper we investigate the relation between the trading of Nokia 
investors, the Nokia price dynamics, and the flux of news about Nokia. As a 
source of news reaching financial markets worldwide we use the Headlines of 
the NewsScope archive of news released in English by Thomson Reuters dur- 
ing the investigated time period. Specifically, from the complete NewsScope 
archive we have extracted all headlines in English language labeled with at 
least one Nokia Reuters Instrument CodeR The set comprises 11,484 unique 



lr rhe RICs used to extract the headlines are NOK.W, NOK1V.HE, NOK1V.AS, 
NOKN.MX, NOKA.BA, NOKy.BE, NOK.MW, NOKy.F, NOKlVEUR.VIp, 
NOKlVEUR.Ip, NOKlVEUR.STp, NOKS.HA, NOKS.H, NOKS.DE, NOKy.D, 
NOKAc.BA, NOKy.MU, NOKy.DE, NOK1VM0110.HE, NOK1VEUR.PZ, NOK, 
NOKlVEUR.DEp, NOKI.ST, NOKS.BE, NOKS.F, NOK.DF, NOK.N, NOKAd.BA, 
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Table 1: Summary of the number of investors (# ids), the number of transactions (iV), 
and the exchanged volume (V) for the Nokia single investors in the period Jan. 2, 2003 - 
Dec. 30, 2008. Volume is given in millions of shares. The investors are divided in the six 
categories. Nominee registered investors are not considered. Note that for transactions 
between two single investors the volume is counted twice, once for the buyer and once for 
the seller. 



Category 


# ids N V 


Companies 

Financial 

Governamental 

Non profit 

Households 

Foreign 


8,396 1,009,226 4,825 
392 4,079,174 21,402 
124 39,278 1,985 
922 21,778 248 
129,952 1,555,096 1,993 
1,405 789,552 7,685 


Total 


141,190 7,494,104 38,138 



headlines. Each headline is associated with one or more release time (multi- 
ple releases of the same headline are frequent). In case of multiple releases of 
the same headline we use as time of the headline the time of the first release. 
In Fig. [T] we show the average daily pattern of the arrival rate of Nokia news 
per minute as a function of the time of the day. Time is computed in coordi- 
nated universal time (UTC) and is corrected for the setting of daylight saving 
time in UK and for the difference between UK and US daylight saving time. 
The figure shows that news start to arrive at an high rate around 5.00 am 
(UTC time) and the distribution is roughly flat until 4.30 pm, which is the 
time of market closing in Europe. Spikes of the probability density function 
are observed around the time of market opening (8.00 am) and closing (4.30 
pm) in Europe and opening (2.30 pm) and closing (9.00 pm) of US NYSE 
and NASDAQ markets. In the figure, opening and closing of markets are 
indicated with vertical lines. 

We assume that the largest majority of the Finnish investors are trading 
in European markets and for this reason, in the present study, we consider 
only the headlines reaching financial markets during European trading hours 



NOK.P, NOK.C, NOKlVEUR.MIp, 0HAF.L, NOKlVEUR.PAp, NOK1V.MI ,NOKS.D, 
NOKS.MU 
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Figure 1: Average daily pattern of the arrival rate of news on the Nokia company. The 
rate is measured in number of headlines per minute. The vertical lines indicate the time 
of opening and closing of European and New York Stock Exchange market. Data are 
adjusted for the daylight saving time. 
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(from 8.00 am to 4.30 pm UTC time). Note that a clear peak of news arrival is 
observed around the closing time of European markets (4.30 pm). Since most 
of these news are probably telling how the market is closing, we repeated the 
analysis by removing the last 10 minutes of the trading day, but the results 
presented below remain essentially unchanged. 



4. Variables characterizing trading activity, price dynamics, and 
the flux of news 

In our analysis we consider three sets of variables, one containing vari- 
ables describing the trading action of the investors, one describing the price 
dynamics, and one describing the news feed. In this section we define the 
variables and describe some of their statistical properties. 

4-1. Definition of the variables 

The first set of variables characterize the trading activity of single in- 
vestors belonging to different categories. The high degree of heterogeneity of 
investors in the frequency and volume of trading makes difficult to compare, 
for example, the activity of an household trading small volumes once every 
three months with the one of a financial institution that trades every day 
large volumes. Since we are primarily interested in comparing the impact of 
news on the daily trading of single investors, we use categorical variables that 



describes their trading activity. Similarly to what Tumminello et al. (2012) 
did, we use the daily categorical variables of Selling investors (S), Buying 
investors (B) and Buying and Selling investors (BS). 

The classification is obtained as follows: for each investor % and each 
trading day t, we consider the Nokia volume sold V s (i,t) and the Nokia 
volume purchased Vb(i,t) by the investor i in that day. This information is 
then converted into a categorical variable with 3 states: primarily buying B, 
primarily selling S, buying and selling approximately closing the position BS. 
The conversion is done by using the ratio 

V t (i,t)-V,(i,t) 

We assign an investor a primarily buying state B when q(i, t) > 6, a primarily 
selling state S when q(i,t) < —6, and a buying and selling state BS when 
-6 < q(i,t) < 6 with V b (i,t) > and V s (i,t) > 0. When V b {i,t) = V 8 {i,t) = 
we consider the investor not active on day t. According to |Tumminello et al. 
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( 2012[ ), in the present study, we set 6 = 0.01. Roughly speaking, investors in 



a buy (sell) state can be seen as acting as net buyers (sellers), while investors 
in a buy and sell state can be thought as intermediaries or day traders. 

We use the categorical variables associated to each investor i for each 
trading day t to compute the time evolution of the number of investors of 
a given category K performing a specific trading action (buying, selling or 
buying and selling). Specifically, N§ (t) is the number of investors of category 
K classified as buyers at day t, Ng(t) is the number of investors of category 
K classified as sellers at day t, and N^ s (t) is the number of investors of 
category K classified as buying and selling at day t. 

From these variables we obtain the derived variables 

N K (t) = N«{t) + N*(t) + NX 8 (t) (2) 
AN${t)=N*{t)-N$(t) (3) 

^ = <*> 

The variable N K (t) quantifies the number of trading investors of category 
K without discriminating the nature of the trading action. The variables 
AN^if) and ANn(t) quantify the polarization of the trading choices of in- 
vestors of category K towards a buying decision (positive values) or a selling 
decision (negative values) in absolute and relative terms, respectively. Note 
that N K {t) includes all the investors, including those in BS state, whereas 
AN A{t) and the numerator of AN nit) are calculated by using only those in B 
and in S state. However the denominator of ANn(t) contains all the active 
investors. 

Market price dynamics is quantified by considering daily return of Nokia 
stock traded at the Nordic Stock Exchange, i.e. 

Ret(t) = log P(t) - log P(t - 1) (5) 

where P(t) is the closing price at day t. We also consider a proxy of the daily 
volatility of Nokia at the Nordic Stock exchange defined as 

voi[i) -z (t) + P ■ (t) { ' 

where P ma x{t) and P m i n (t) are the highest and lowest price of Nokia at day 
t, respectively. 
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We finally consider two variables quantifying the flux of news about Nokia 
arriving at day t. First of all, we consider the number H (t) of Nokia headlines 
released by Thomson Reuters between 8.00 am and 4.30 pm during trading 
day t. This number is a measure of the "intensity" of news reaching the 
market in a given day. However headlines news and the associated stories 
may bear positive, negative or doubtful information. In analogy to |Tetlock 



et al. (2008), we quantify the sentiment carried by the headlines by con- 
structing a sentiment proxy using the number of positive and negative words 
present in them. Positive and negative words are detected by using the Gen- 



eral Inquirer (http://www.wjh.harvard.edu/~inquirer/), a well-known con- 



tent analysis program which is using the General Inquire categories from the 
Harvard psychosocial dictionary. Once the number of positive (G(t)) and 
negative (B(t)) words contained in all the headline news at day t have been 
computed, we determine the variables 

w«) = g(«)-b(«) s«w = |§^f§ m 

giving the absolute and relative, respectively, sentiment of the news in a given 
day. To avoid spurious discretization effects in the calculation of Sji(t) 6 
[—1, 1] we require G(t) + B(t) ^ 5 to compute the sentiment indicator. When 
G(t) + B(t) < 5 we set Suit) = 0. The time evolution of Sji(t) is therefore 
different from zero only when a significant number of positive and negative 
words are detected in the headlines. 

4-2. Descriptive analysis 

Table [2] shows the summary statistics of the investigated variables. We 
note that for the number of active investors N K the mean is always larger 
than the median, indicating a positive skew of the distributions. Moreover 
the standard deviations are typically quite large. The Table shows that for 
the governmental, non profit, and foreign categories, the relative imbalance 
AiV Jf reaches the minimum and maximum value of —1 and +1, respectively. 
The presence of these values indicates that, in some days, all the active 
investors of the considered category took the same market position. How- 
ever, it is worth noting that the three categories presenting this behavior 
are the ones having an average number of active investors lower than ten 
(4.713, 5.278 and 8.356 for governmental, non profit and foreign category 
respectively). In other words the complete market polarization in most cases 
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Figure 2: From top to bottom the figure shows the time series of the number of Nokia 
headlines H{t), the daily volatility Vol(t) of Nokia stock, and the time series of N K (t) for 
the category of Financial investors and for the category of Households investors. 
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involves a limited number of single investors. The same is valid for the oc- 
currence of the minimum value —1 observed for households category. This 
occurrence happened only once during a market session that involved only 
four households investors. 

Figure [2] shows the time series of the number of Nokia headlines H(t), the 
daily volatility Vol(t) of Nokia stock, and the time series of N K (t) for the 
category of Financial investors and for the category of Households investors. 
We note that the time series of H(t) presents a background around 4 headlines 
per day and a series of spikes jumping to 20 headlines or more per day. The 
flux of news is not clustered in time, in fact the autocorrelation function is 
significant only at one lag. The time evolution of the volatility proxy Vol(t) 
shows that also this quantity is characterized by a typical value (of the order 
of 2 percent) and by days of huge swings with values of Vol(t) of the order 
of 20%. The spikes of news and volatility are clearly correlated with the 
spikes of N K (t) for all the categories of investors. In the figure we show 
the time evolution of the number of financial and households investors. The 
overall behavior of the other categories is similar. It is worth noticing that 
the autocorrelation of N K is quite persistent and it is statistically significant 
at 2a for more than 30 days. 

In Fig. [3] we show the time series of the difference Sa(£) between the 
number of positive and negative words in Nokia headlines, of the daily return 
Ret(t) of Nokia stock, and of AN^(t) for the category of Financial investors 
and of Households investors. The time evolution of Sa(£) fluctuates around 
zero but also presents a series of positive and negative spikes jumping to 
the level of order 20 for positive or 10 for negative words per day. The 
time evolution of Nokia return is characterized by a non Gaussian profile of 
the return probability density function and volatility clustering. Spikes of 
AN^(t) are also detected but they are in general less pronounced than in 
the case of N K (t) (see the bottom two panels of Fig. [2J suggesting that the 
interpretation of news and/or endogenously extracted market information 
is usually different among investors of the same category. However, some 
pronounced spikes are still observed showing that in some occasion investors' 
categories take the same kind of trading action. Moreover autocorrelation 
analysis shows that for financial institutions AN^ is persistent only at one 
lag, while for households the autocorrelation function is statistical significant 
up to 30 days. This fact indicates the presence of persistent "moods" the 
household investors are following in their trading actions. 

In order to investigate the difference between absolute and relative vari- 
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ables we show in Fig. [4] the time series of the relative sentiment indicator 
Sft(t), the daily return Ret(t) of Nokia stock, and the time series of AN^(t) 
for the category of Financial investors and of Households investors. The time 
series of Sn(t) presents a series of positive and negative spikes often clustered 
in time and value. Quite interestingly, the time series of AJV#(t) does not 
present spikes but rather a noisy oscillation around zero. As in the case of 
AN ^ , the time evolution of households shows a slow dynamics lasting up to 
several trading months. 

5. Regression results 

In this Section we use regression and partial correlation analysis to as- 
sess the role of endogenous or exogenous factors in determining the trading 
behavior of single investors. More specifically, we consider first of all the 
decision to trade (irrespectively of the specific position taken) and we regress 
it against an endogenous factor, namely volatility, and against an exogenous 
factor, namely the number of news. Similarly, in a second step we consider 
the imbalance (absolute or relative) between buyers and sellers and we regress 
it against contemporaneous return and the sentiment indicator. Also in this 
case, the first regressor can be considered endogenous and the second one 
as exogenous. Before presenting the results of the regression, two comments 
are in order. First of all, the distinction between exogenous and endogenous 
is not clear cut and in fact, as we will see, in both cases the regressors are 
not independent. Second, we shall consider contemporaneous variables and 
therefore it should be clear that no causality can be attached to our results. 
To be more explicit, the fact that we find a significant relation in the regres- 
sion between number of investors and volatility does not necessarily imply 
that volatility triggers people to trade, but it can be the other way around, 
i.e. a large number of investors increases the volatility through their trad- 
ing. Another possibility is that the two variables are influenced by a third 
unobserved factor. 

5.1. Volatility and number of news 

Let us consider first how the decision to trade (irrespectively on being 
a buyer or a seller) is related with volatility Vol and the flux of news H 
for different categories of investors. Volatility and news are different sources 
of information, one primarily endogenous and one primarily exogenous to 
the market, but they are not mutually independent. In fact the empirical 
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correlation between the two variables is Corr[if, Vol] = 0.501. To interpret 
the role of the two variables in trading, we consider the linear model 

N K (t) = a H H{t) + a Vol Vol(t) + e(t) (8) 

where N K , H, and Vol are standardized versions with zero mean and unit 
variance of N , H, and Vol, respectively. 

In Table|3]we show the values of the an and ay i coefficients together with 
the variance of the residual obtained by ordinary least squares. In the Table 
we also report the 5%-95% confidence interval of each coefficient under two 
null hypotheses. The first is the customary assumption of Gaussian errors, 
while the second ones are obtained by bootstrapping the data and therefore 
taking into account the distributional properties of the data. We show the 
results for each category of investors separately. All the regression coefficients 
are statistically not consistent with zero (at the given confidence level). The 
an coefficient is ranging from a minimum value of 0.158, which is observed for 
the foreign organizations, to a maximum value of 0.319, which is observed for 
the non profit organizations. The ay \ coefficient is ranging from a minimum 
value of 0.192, which is observed for the governmental organizations to a 
maximum value of 0.627, which is observed for households. The variance of 
the residuals of N K is ranging from 41.4% (households) to 86% (governmental 
organizations) indicating that in most cases the explanatory value of the two 
variables is quite significant. 

For four categories of investors (companies, financial institutions, house- 
holds and foreign organizations), the ay i coefficient is higher than an in- 
dicating that for these investors the market endogenous information has on 
average an higher explanatory role than market exogenous information con- 
veyed by news on their decision to trade. The case of governmental orga- 
nizations and non profit organizations is different. For these categories an 
is greater than ay i but the two values are within the 5%-95% intervals of 
the estimated coefficients. Moreover, for these categories the variance of the 
residual assumes the maximal values which are observed being 86.0 % for 
governmental organizations and 73.9 % for non-profit organizations. 

We complement the regression analysis by computing the partial cor- 
relation coefficients of the three variables N K , Vol, and H. The partial 
correlation coefficient p(x, y\z) between variables x and y conditioning on 
the variable z is the Pearson correlation coefficient between the residuals of 
x and y that are uncorrelated with z. Partial correlation is clearly related 
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to linear regression. However, the information one obtains from the analysis 
of regression coefficients is not identical to the one obtained by consider- 
ing partial correlation. In fact, for normalized variables, the best linear fit 
z = a x x + a y y + e gives 



where p X:Z and p y ^ z are the correlation coefficients between x and z and be- 
tween y and z, respectively (please refer to Appendix A fort more details). 
Only when p X)Z = p VjZ the ratio between the regression coefficients is equal 
to the ratio between the partial correlations. 

Table g shows the values of p(N K , H\Vol) and p(N K ,Vol\H) for the six 
categories of investors. Consistently with the previous results, we notice that 
p(N K , H\Vol) > p(N K ,Vol\H) for Governmental and Non profit organiza- 
tion, while the reverse is true for the other categories of investors. 

In conclusion, volatility and the flux of news are correlated with the de- 
cision to trade of single investors and their relative role, when properly dis- 
entangled, is different for different categories of investors. Governmental and 
non profit institutions are the categories for which news and volatility give 
less explanatory power of their presence in the market. Moreover, these 
investors are more sensitive to news than to volatility. Households and com- 
panies are much more sensitive to volatility than to news and the variance 
of their activity explained by these factors is quite high. Also for financial 
institutions, volatility is more important than news, but the variance is rel- 
atively smaller. Finally, foreign organizations are more affected by volatility 
than by news, but the variance of the regression is quite small. 

5.2. Returns and sentiment indicator 

By having verified that news play an important role in the decisions of 
single investors to trade, we now focus our attention on the impact of the 
sentiment carried by news on Nokia return and on the trading behavior 
of the single investor to buy or sell a certain amount of Nokia stock. As 
sentiment indicators we consider both the absolute sentiment indicator Sa{$) 
and the relative sentiment indicator Sii(t). Both daily sentiment indicators 
are correlated with the daily Nokia return of the Nordic Stock Exchange. 
The values of the correlation are Cott[Sa, Ret] = 0.155 and Cott[Sr, Ret] = 
0.118. These values are small but statistically significant. In fact the average 
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correlation observed for shuffled time series of S^t) and Sn(t) with Nokia 
return is smaller than 0.002 with a standard deviation of 0.026. 

We analyze the explanatory role of Sa and of the Nokia return Ret by 
considering the linear model 

AJVf (t) = a s J A (t) + a Ret Ret{t) + e(t) (10) 

where AiV^, Sa, and Ret are standardized versions with zero mean and 
unitary variance of AiVjf , Sa, and Ret, respectively. 

In Table [4] we show the values of the as A and a Ret coefficients together 
with the variance of the residuals obtained by ordinary least squares. As in 
the previous case, we also report the 5%- 95% confidence interval of each 
coefficient under Gaussian hypothesis and bootstrap analysis and the results 
are shown for each category of investors separately. Most of the as A coeffi- 
cients are consistent with zero within the 5%-95% confidence interval. In fact, 
for bootstrap confidence intervals all the a$ A are consistent with zero, while 
under normality hypothesis values (slightly) different from zero are observed 
in the cases of companies and households investors. The atRet coefficient is 
always statistically significant and it is ranging from a minimum value of 
—0.653 observed for households to a maximum value of —0.196 observed for 
governmental organizations. The variance of the residual of AN^ is ranging 
from 57.9% (households) to 95.9% (governmental organizations) indicating 
that still for some of the categories (companies, financial institutions, house- 
holds and foreign organizations) the explanatory value of the two variables, 
and especially of the return, is significant. 

The aR e t coefficients are in several cases (companies, financial institutions, 
households and foreign organizations) negative large values indicating that, 
for these categories, the market polarization of trading actions is strongly 
anticorrelated with the Nokia return. The majority of single investors of 
these categories are therefore buying when the Nokia price goes down and 
selling when the price goes up. This is reminiscent of a contrarian behavior, 
even if in this case we are considering contemporaneous rather than lagged 
correlation between return and decision to buy or sell. A similar behavior 
is also observed, although to a less pronounced level, for governmental and 
non-profit organizations investors. The partial correlations showed in Table 
[2] confirm this results (see below for an extended comment) . 

These results are essentially confirmed by the analysis concerning the 
explanatory role of the relative sentiment indicator Sr and of the Nokia 
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return Ret for the relative variable AN^ investigated by considering the 
linear model 

AN* (t) = a SR S R (t) + a Ret Ret(t) + e(t) (11) 

where AN^ , Sr, and Ret are standardized versions with zero mean and 
unitary variance of AN^ , Sr, and Ret, respectively. By comparing the 
results of Table [5] with the results reported in Table [4] we conclude that the 
observations done for AN% , Ret and S A are very similar to the ones obtained 
for the relative variables AN^, Ret and Sr, and therefore do not depend 
significantly on the specific definition of the sentiment indicator. The only 
difference we detect is observed for companies and households. It concerns 
the values of the coefficient ots R which are not consistent with zero within 
the 5%-95% confidence interval. 

The partial correlations shown in Table [4] and [5] indicate that \p(N%, S A \Ret) | 
< \p(N%,Ret\S A )\ and \p{N% , S R \Ret)\ < \p(Ng,Ret\S R )\ for all the cate- 
gories of investors. The highest values of partial correlations, p{N^ , S A \RGt) 
and p(Nji , Sji\Ret), are observed for companies and households. However, 
in general, these correlations are quite small in absolute value (between 5% 
and 8%), and very close to the noise level. For the other categories, these 
partial correlations are even smaller, and statistically not significant. 

The joint analysis of absolute and relative imbalance between buyers and 
sellers leads us to the following conclusions. The activity of governmental 
and non profit organizations is very poorly explained by return and news 
sentiment. Of the two factors, return plays clearly a major role. Households 
and companies are those for which sentiment and returns have the best ex- 
planatory power of their trading action. Return is clearly more important, 
but sentiment has also some explanatory power, especially when one consider 
the relative imbalance between buyers and sellers. For financial and foreign 
organizations the variance explained by the regressions is somewhat interme- 
diate between the two pairs of categories above, but in general returns have 
a much higher explanatory power and sentiment plays a negligible role. 

In conclusion, the regression analysis shows that the total flux of news 
is significantly correlated with the decision to trade. This seems to indicate 
that, on a daily time scale, news move investors to trade. However, we find 
that most of the times the sentiment indicator is not significantly correlated 
with the imbalance between buyers and sellers. One possible explanation is 
that the sentiment indicator does not discriminate accurately a good news 
from bad news. As mentioned above, sentiment indicator has a correlation 
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with returns of value around 0.15 and therefore it seems possible that a 
more accurate semantic indicator could better capture how good or bad is 
the news. Another possibility is that most of the times investors do not 
agree on the positivity or negativity of the news and/or that they react in 
an heterogeneous way to the same news. 

6. Conclusions 

By using a linear regression model to describe the number of investors 
trading the Nokia stock as a linear combination of a proxy of news and a 
proxy of stock volatility, we have shown that the trading activity of all the 
investigated categories of investors is significantly correlated with both the 
flux of news and the daily volatility. The relative role of exogenous (news) and 
endogenous (volatility) factors is, in general, difficult to disentangle because 
the two variables are highly correlated. By assuming a linear model, we 
have shown that the relative relevance of the two variables changes across 
the different categories of investors. The dependency from volatility turned 
out to be more pronounced than the one from news, for companies, financial 
institutions, households, and foreign organizations. 

The second conclusion concerns the relationship between the sentiment 
of news, Nokia return, and the market polarization towards buying or selling 
for the different investors' categories. Our results show that the sentiment 
time series is correlated with Nokia return. We have also shown that both the 
Nokia return and the sentiment of news explain part of the trading polariza- 
tion dynamics of some categories, according to a linear model. Specifically, 
we observed that all categories have a contrarian like response to return, and 
that companies and households have a positive, although small sized, reac- 
tion to positive sentiment of news. 
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Appendix A. Proof of Eq. (|9 



Suppose we aim at explaining the behavior of a stochastic variable y with 
time samples {yi, ...,?/t} as a linear combination of two correlated variables 
X\ and x 2 with sample series ^i,t} e {^2,i> •••> x 2,t}, respectively. 

Synchronous sampling of the three variables is assumed. We focus on the 
linear model: 

y = ai ■ xi + a 2 ■ x 2 + (3e, (A.l) 

where e is the idiosyncratic term. If we assume (without loss of generality) 
that all the variables, including y, are standardized to zero mean and unit 
variance, then the value of the coefficient /3 is determined by the following 
equation for the variance: 

< y 2 >= 1 = a\ + a\ + 2a 1 a 2 pi, 2 + P 2 , (A.2) 

where pi <2 is the correlation between x\ and x 2 . We shall calculate the value 
of j3 by using this equation afterwards, that is once the value of azi and a 2 will 
be regressed from data. According to the least squares method, estimates of 
these two parameters are obtained by minimizing the following function: 

T 

/(«!, a 2 ) = J ^2(y i -a 1 - x hi - a 2 ■ x 2)i f . (A.3) 

i=l 

We have 

= da = _ 'yLw~ a i' x ^ ~ a i " x 2,i) • xi ti = 



=1 



-2T ■ (p hy -ai- a 2 pi, 2 ) , (A.4) 



where pi tV and pi i2 are the estimated correlations between x\ and y and 
between x\ and x 2 , respectively. In the previous equation, the last equality 
is based on the assumption that data has also been standardized. Similarly, 
by differentiating with respect to a 2 we obtain: 



= " J v ^ 1 '^ z/ = -2T • — y~] (y { - «i ■ xi,i - a 2 ■ x 2)i ) ■ x %i 
oa 2 1 ^—f 



df(ai,a 2 ) 

i=l 

-2T ■ (p2 t y - a!p lj2 - a 2 ) , (A.5) 
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where p2 jV is the estimated correlation between x 2 and y. The latter two 
equations allow one to estimate a\ and ct 2 as 

Pl,y - P2,y ■ Pl,2 

Oil — : 2 ' 

1 - Pl,2 

a 2 = l>2 -\ >'-\ (A.6) 



I -pi. 



2 



Finally, these results are used in Eq. (A. 2) to estimate parameter (3: 
(3 2 =1 — al — al — 2aia 2 Pi,2 = 

1 - Pl,2 - Pl,y ~ p\y + 2 Pl,2 Pl,y p2,y \Y\ 



2 i 2 

Pl,2 1 - Pl,2 



(A.7) 



where |T| is the determinant of the correlation matrix of the three variables 
Xi, X2, and y. This result indicates that (3 2 is always non-negative (as it 
should be), and it represents the fraction of variance of y that is not explained 
by the two variables x\ and x 2 , as it is in the standard linear models based 
on independent variables. The case of uncorrelated X\ and x 2 , is simply 
obtained by setting p 1>2 = in the previous equations. The result is that 
"i = Pi,y, ot 2 = P2, y and (5 2 = 1 — p\ y — pl y , as expected. 
Hence, the estimates of a\ and a 2 are related to the partial correlations 



P(y,x 1 \x2) = — r ^- p ^y p ^ 



^ - Ply) ■ (1 - Ph) 



p(y,x 2 \xi) = - (A.8) 



2) 



through the equations: 



;i - Pi,y) ■ (i - Ph) 

ai = 2 p(y,xi\x 2 ) 

1 Pl,2 



v " Ply) ■ ( X - Ph) 
a 2 = _ 2 p{y^2W). (A.9) 

1 Pi, 2 



In summary: 



at p(y,x l \x 2 ) /1-P 2 



u 



a 2 p(y,x 2 \x 1 ) V 1 - pi 



(A.10) 



which is equivalent to Eq.(9). 
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